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ABSTRACT 

This paper evaluated multidimensional linking 
procedures with which mult i dimens i onal test data from two separate 
calibrations were put on a common scale. Data were simulated with 
known ability distributions varying on two factors which' made linking 
necessary: mean vector differences and var iance-covariance (v~c) 
matrix differences. After the calibrations of multidimensional item 
parameters, blocks of means from item parameter estimates were used 
to equate the two groups. The linking was effective for mean vector 
differences. The linking for v~c matrix differences was less 
effective, but encouraging. Suggestions for future research are 
provided. Four tables are attached. (Contains 6 references.) 
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Abstract 

This paper evaluated multidimensional linking procedures with which 
multidimensional test data from two separate calibrations were put 
on a common scale. Data were simulated with known ability 
distributions varyincj two factors which made linking necessary:' 
mean vector differences and variance-covariance (v-c) matrix 
differences. After the calibrations of multidimensional item 
parameters, blocks of means from item parameter estimates were used 
to equate two groups. The linking was effective* for mean vector 
differences. The linking for v-c matrix differences was less 
effective, but encouraging. Suggestions for future research are 
provided. 
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Evaluation of Procedures for Linking 
Multidimensional Item Calibrations 
Unidimensional item response models have long been considered 
as somewhat unrealistic. Very fe;/ combinations of test items and 
examinee populations can be reasonably argued to produce truly 
unidimensional item response data. Fortunately , such arguments 
have become less important , as fast, convenient and inexpensive 
multidimensional item calibration computer programs have become 
more available (Fraser, 1987? Muthen, 1988; Wilson , Wood & Gibbons, 
1991). That -these programs have not been more widely used is 
perhaps due to a shortage of well established applications to 
practical testing problems . This is in contrast to the 
unidimensional case, for which reliable and intensely researched 
applications to test equating, test construction, item banking, and 
detection of differential item functioning have been developed 
(Lord, 1980) . 

A basic requirement of many practical applications is a way of 
linking items calibrated on different samples of examinees onto a 
common abi 1 ity metric . Whi le a variety of procedures have been 
proposed for the unidimensional case , extensions to the 
multidimensional case have not been carefully investigated. Davey 
and Kashima (Davey, 1991) have proposed a general framework for 
linking multidimensional calibrations. The purpose of this paper 
is to assess the performance of the linking procedures under this 
framework. 
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Linking multidimensional item calibrations 
The compensatory ( or 1 inear ) multidimensional item response 
model (McKinley & Reckase, 1983) expresses the probability of a 
correct response to the item i by an examinee with the ability 
vector 9^ as 1 : 

Prob{u ±j = 1 ! fl J .^ i ,i2.;c i ) = PtiBj) = c £ + (1 - c i )LU T .(Q j - b.)] 

where L(*) is either the logistic or normal distribution function, 
the vectors & L = <a 12 , . . . ,a lk > and b JL = <b lx , b 12 , . . . ,b lk > characterize 
item discrimination and difficulty with respect to the is ability 
dimensions, and c x gives the probability of an examinee with very 
low ability answering correctly by chance. 

Because item and ability parameters enter the logistic 
function in the Jorn a T (3~fc) , the vectors of discrimination 
parameters a can be linearly transformed (rotated) to a by 
premultiplying them by a nonsingular matrix F*, providing the 
ability and difficulty vectors, 8 and fe, are correspondingly 
premultiplied by the inverse, F~ l . Similarly, difficulty parameters 
can be translated by adding a vector of constants, £, to each h, 
provided these same constants are added to ability vectors as well. 
Thus, 3*=F~ X Q + fi and b* = F~*h + Q. Such transformations of the 
ability scale produce no net effect on item response surfaces. 

Indeterminacy in the latent trait model is usually resolved by 
requiring obtained parameter estimates to satisfy some number of 
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conditions or constraints. Imposing these constraints "identifies" 
the model and allows unique estimation of the remaining parameters. 
Unidimensional solutions are typically identified by setting the 
mean and variance of either the examinee ability or the item 
difficulty estimates to specified constants. For example, ability 
estimates may be scaled so as to tave zero mean and unit variance. 
Multidimensional models require not only that the location and 
scale of each ability axis be fixed, but also that the orientation 
of the axes be specified. The simplest way of specifying the 
orientation of the ability axes is to set one or more item 
discrimination parameters to zero on a given dimension. However, 
more elaborate constraints are possible, and in fact desirable. 
For example, the mean of sets of discrimination parameters can be 
set to specified constants. 

Estimating transformation parameters 
The sum and substance of scale linking is finding rotation 
matrices, F, and translation vectors, e, that take parameter 
estimates from separate calibrations to a common ability metric. 
For this to be possible, the separate calibrations must be based on 
common or randomly equivalent examinees, or include common items. 
The latter case is the principal focus here. The particular 
linking model considered regards one set of parameter estimates as 
a base that defines the ability space, while the second set of 
estimates is to be transformed to be consistent with that space. 
Davey (1991) suggested estimating scaling parameters by 
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simultaneously solving a set of scaling equations. Each scaling 
equation sets some function of the common item parameter estimates 
equal across calibrations by applying the proper choice of F and e 
to one set of estimates. The left hand side of each scaling 
equation is a function of the parameter estimates from the base 
calibration sample, while the right hand side is the same function 
of the other set of parameter estimates (or, more precisely, the 
transformed versions of these estimates) . More formally, let a L , 
b x , and £ x denote the parameter estimates for the common items form 
the base calibration sample, while & 2 , £ 2 , and & 2 represent 
parameter estimates from the second calibration. The system of 
scaling equations then takes the form: 

h x (a x , h x , SJ = M FT a 2 ' F-^+e, ir^+e) 

h 2 (a x /b x , ftj = h 2 (F T a 2 , F'^+e, p-^+fi) 



h q (a x , fe x , fij = h q (F T a 2 , F-b^e, F-^+fl) 

where q is the number of elements of F and £ to be estimated. The 
resulting, generally nonlinear, equations are solved simultaneously 
for the unknown elements of the rotation matrix F and the 
transformation vector g. 

While any properly structured set of scaling functions can 
serve to estimate scaling parameters 2 , it is believed that more 
stable estimates will be obtained if the scaling equations 
themselves are stable functions of the item parameter estimates. 
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For example, equating the means of a large number of item parameter 
estimates is preferable to equating single estimates. 

Methods 

Design 

This study concentrated on linking items calibrated under a 
two-dimensional model on samples from base and focal examinee 
populations. Using a compensatory multidimensional two-parameter 
logistic model, 40-item two-dimensional data sets were generated 
for the base and focal groups. Two factors were considered in this 
study to make scale linking necessary. The first factor, the 
difference between the mean vectors for the two groups, had three 
levels (no differences, small differences, and large differences). 
The base group always had a mean vector of [0, 0]. On the other 
hand, the focal group had three levels of the mean vector, (a) [0, 
0], (b) [-.5, -.5], and (c) [ -1, -1]. 

The second factor, the difference between the variance- 
covariance (v-c) matrix, also had three levels (no transformation, 
an orthogonal transformation, and an oblique transformation). The 
base group always had the v-c matrix of [1 .5, .5 1]. For the 
focal group , the v-c matrix was either (a) [1 .5, .5 1] which 
required no transformation , (b) [.8 .4, .4 .8] in which the 
variance was smaller but the correlation was the same as compared 
to the base group, requiring an orthogonal transformation, or (c) 
[1 .7,. 7 1] in which only correlation differed, thus requiring an 
oblique transformation. 
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The two factors were completely crossed (3 x 3). Table 1 
summarizes the nine conditions (CI - C9). In each condition , 
examinees were drawn from either the base or the focal group, and 
responses to a common set of items were generated. Item parameters 
were then calibrated independently using NOHARM (Fraser, 1987) on 
both samples. Finally , the focal ability metric was linked to that 
of the base either by (a) a "poor" lining method or (b) a "good" 
linking method. These linking methods are described in the next 
section. In each condition , this process was repeated 20 times to 
produce distribution of linking and item parameter estimates f 
Consequently, 180 pairs (i.e., base and focal groups) of data sets 
were analyzed. 



Insert Table 1 about here. 



Linking Methods 

Using a two-dimensional model, F was a 2 x 2 matrix and £ was 
a two element vector. No restrictions was imposed on the structure 
of F and £, so a total of six scaling parameters were to be 
estimated , with six scaling equations required to do so. Two 
different methods for obtaining scaling equations were used. The 
first method, a "poor" linking method, equated only individual item 
parameter estimates across calibrations, and expected to yield poor 
estimates of the scaling parameters. More specifically, the six 
scaling equations were set using only two items of the test. The 
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six equations are: 



a :b2 




a if2* 


a 2b2 




*2f2* 






b 1£2 * 


b 2b2 




b 2£2* 






a if3* 


a 2b3 




a 2f3* 



where a, and a„ stand for the elements of the & vector and b and b 

12 X 2 

the elements of the b vector. The subscript 'b' or 'f indicates 
the base or focal group, respectively. The last subscript 
indicates Item. Items 2 and 3 were used. The star * indicates 
transformed values. The choice of two specific items was arbitrary 
except avoiding Item 1 which always had a fixed a 2 from the NOHARM 
estimates . 

The second method, a "good" linking method, equated the means 
of the blocks of item parameter estimates. In this study, all the 
items were used for the six scaling equations. The six equations 
are: 
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Again, the choice of the above six equations was arbitrary. The 
simultaneous equations for both linking methods were solved by 
using SAS/ETS in which the Newton method was used to solve 
equations . 
Data Generation 

Item parameters from a real test was used. The parameters are 
shown in Table 2. These parameters are estimates from a 1992 form 
of the ACT Assessment Mathematics test. Ability parameters (9 1 and 
Q 2 ) were simulated from a random normal distribution. The mean 
vector varied from condition to condition as described earlier. 
The variance was also varied for some conditions. The appropriate 
correlated 8 x and 9 a for each condition were simulated by first 
generating two independent normally distributed pseudorandom 
variables z x and z 2 and then transforming them to 9 x and 0 2 by 
weighted linear transformations. The weights were the elements of 
T T , a matrix which satisfies R = T*T, where R is the target 
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correlation matrix. The sample size of each group was 1000. 



Insert Table 2 about here. 



Analysis 

The linking procedures were compared at two levels. First, the 
mean linking parameter estimates from 20 replications were compared 
to true linking parameters. These true values were known since the 
population ability distributions were specified. Second, 
transformed item parameter estimates from the focal group were 
compared with true item parameters; using root mean square error 
(RMSE). In addition, RMSE between the focal group item parameter 
estimates before transformation and true item parameters was 
obtained to establish the "no linking" baseline condition. 

Results 

Table 3 shows the comparison of true and estimated linking 
parameters. The elements of the matrix F are expressed as fl, f2, 
f3, and f4, and those of the vector g as el and e2. Clearly, the 
"poor" linking methods produced linking parameter estimates which 
are drastically different from true parameters. The estimates 
varied from replication to replication as indicated by rather large 
standard deviations. On the other hand, estimates from the "good" 
linking method are more consistent with true parameters. In 
addition, the smaller standard deviations over the replications 
suggest that the method produces fairly stable estimates. 
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Insert Table 3 about here. 



The effects of the mean vector difference and the v-c matrix 

/ 

difference can be observed by comparing results from CI through C9 
for the "good" linking method. The mean vector difference are 
reflected in el and e2. The mean vector difference also affected 
the estimates of F. As the mean vector difference became large, 
estimates of F deviated from true parameters more (see CI, C4 , and 
C7). As expected, the v-c matrix affected the estimates of 
However, the estimates are not quite consistent with true 
parameters for both orthogonal and oblique transformations. 

Table 4 summarizes RMSEs between true and estimated item 
parameters after transformation. Reported are the mean and 
standard deviation of RMSE over the 20 replications. Again, it is 
obvious that the "poor" linking method produced item parameters 
which were very , different from the true parameters. The mean RMSEs 
are large and the standard deviations are also large. Furthermore, 
the "poor" linking method was worse than no linking at all. The 
"good" linking method, on the other hand, appears to be an 
improvement over no linking except the orthogonal condition. The 
improvement was most obvious when there was a mean vector 
difference. Without linking, the deviation of d estimates can be 
serious. Throughout the conditions, the means and standard 
deviations of RMSE are fairly similar for the "good" linking method 



13 



Linking 
13 



suggesting the effect of different mean vectors and v-c matrices 
are controlled somewhat via transformation. 



Insert Table 4 about here.. 



Also noted in Tables 3 and 4 are the convergence problems. 
There were three cases (out of 360 cases) of convergence problems 
in the calibration process using NOHARM, and one case of 
convergence problems (out of 720 cases) in the linking process. 
The first occurred when examinee abilities were low (C8-C9), 
because one of the items (Item 39) was very difficult (d = -3.77) 
resulting in the p-value of zero. The second occurred for no 
obvious reasons. The simultaneous equations produced no solution 
for the particular case. 

Discussion 

The results of this study indicate that the degree to which 
item parameters from multiple calibrations of multidimensional test 
data are put on a common scale depended on how the scaling 
equations are selected to calculate the linking parameter 
estimates. When the equations are set as such that only a small 
portion of the test is used, the estimates for the linking 
parameters can be seriously distorted. In fact, in our example of 
the "poor" linking method where only two items were used for 
linking, the use of linking had more harm than merit. This 
phenomenon is understandable, because a small portion of items is 
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hardly a representative of the entire test, and linking estimates 
from any deviant items within the small portion can distort the 
remaining item parameter estimates via an undesirable 
transformation . 

The more important question is how to choose the scaling 
equations using the entire test. In this study, all the items were 
used and means of blocks were utilized for the solution to the 
scaling equations for the "good" linking method. However, there 
are many ways to choose the equations. Instead of using means for 
all the six equations, for example, standard deviations of item 
parameters can be used in some of the equations. There is a need 
for further studies in which various combinations of equations are 
compared to produce an optimal linking parameter estimates. In our 
results, the benefit of linking (i.e., the "good" linking method) 
was most obvious when there was a mean vector difference. When 
there was a v-c matrix difference, the results were not as 
straightforward as those from a mean vector difference. An oblique 
transformation seemed to produce item parameter estimates closer to 
true item parameters than an orthogonal transformation. In either 
case, however, the use of linking had no serious negative effect. 

This study involved two stages of estimation. First, item 
parameters were estimated using NOHARM. Second, these item 
parameter estimates were used to estimate linking parameters. 
Therefore, any deviations of linking parameter estimates or item 
parameter estimates after transformation from true parameter values 
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can be attributed to either the recovery ability of NOHARM or the 
estimation ability of the linking method, as well as the sampling 
errors. It appeared in our data using NOHARM that estimation was 
difficult in some correlated 9s resulting some extreme item 
parameter estimates. These extreme estimates in turn affected the 
estimation of linking parameters. 

The generalizability of the results is limited to the 
conditions examined in this study. . Only one kind of 
multidimensional structure was used. Future studies need to. focus 
on different multidimensional structures as well as different 
selections of scaling equations. In addition, there is a need for 
investigating the behavior of NOHARM in a comprehensive study. 
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Notes 

^Because the individual elements of b cannot be uniquely 
determined, the function argument is usually written as a T &+d, 
where d is the product a T b. However, the full parameterization is 
more convenient for purposes of scale linking. The transformation 
from reduced to full parameterization is arbitrary, in the sense 
that there are an infinite number of possible transformations . An 
especially intuitive transformation is the following, which 
distributes an item's difficulty across ability dimensions in 
proportion to the item's discrimination with respect to those 
dimensions : 

2 The equations need only be independent and include as 
unknowns each of the scaling parameters. 

3 If, for example, F were required to be orthogonal, only 
three of its elements need be estimated, the fourth being 
determined. Other constraints on F and g can be considered. 
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Table 2. True Item Parameters 
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Table 3. Comparison of True and Estimated Linking Parameters (20 
Replications in Each Condition) 

(a) C1-C3 

Estimated 

F 

Condition E True Poor Good 

M SD M SD 
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(.13) 


( 3.81) 


.14 


( • 20) 


( 3.12) 


1.24 


(.19) 


(17.12) 


-.01 


( .05) 


(48.03) 


-.03 


( .06) 


( 1.58) 


1.03 


( . 15) 


( -80) 


. 07 


( -24) 


( 4.36) 


-.24 


( .24) 


( 2.20) 


.84 


( .19) 


( 1.34) 


.02 


( -06) 


( 4.22) 


-.04 


( .06) 



* C3 for the "good" linking method is based on 19 replications 
due to a convergence problem in one of the 20 replications in 
calculating linking parameters. 
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(b) C4-C6 

Estimated 



F 

Condition E True Poor Good 

M SD M SD~ 



C4 fl 1.00 1.23 



f2 0.00 -.09 

f3 0.00 -.61 

F4 1.00 1.33 

el -0.50 -.62- 

e2 -0.50 -.38 

C5 fl 0.89 1.23 

f2 0.00 .50 

f3 0.00 -.16 

f4 0.89 -.42 

el -0.50 -.75 

e2 -0.50 -.14 

C6 fl 1.00 3.76 

f2 0.00 .88 

f 3 0.29 -8.61 

f4 0.82 -1.77 

el -0.50 -.87 

e2 -0.50 .25 



1 • OO j 


1 OA 


( i ii 


1 fil \ 


— on 


( i ni 


4. 38) 


-.10 


( -22) 


4.66) 


.95 


( .14) 


.46) 


-.50 


( .11) 


1.07) 


-.46 


(.19) 


1.44) 


1 . 13 


( • 15) 


1.87) 


-.10 


( • 12) 


5.34) 


.07 


( . 30) 


5.01) 


1.36 


( -24) 


.63) 


-.58 


( . 12) 


1.30) 


-.24 


( -20) 


17.24) 


1.06 


( .14) 


4.38) 


. 00 


( • 08) 


51.15) 


-.29 


( -27) 


12.72) 


.88 


( .20) 


.55) 


-.49 


( -12) 


1.43) 


-.60 


(.31) 
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(C) C7-C9 



Estimated 



F 

Condition E True Poor Good 



K SD M SD 



C7 fl 1.00 .34 

f2 0.00 .67 

f3 0.00 3.33 

f4 1.00. -.74 

el -1.00 -.86 

e2 -1.00 -1.45 

C8* fl 0.89 -1.22 

f2 0.00 -2.74 

f3 0.00 8.15 

f4 0.89 11.96 

el -3 .00 -1 .20 

e2 -1.00 -.50 

C9* fl 1.00 .81 

f2 0.00 .50 

f3 0.29 .39 

f4 0.82 -.29 

el -1.00 -1.34 

e2 -1.00 -.09 



( 3.83) 


.91 


( .14) 




— 1 7 

• J. / 




f 9.481 


. 21 


i .331 


( 9.73) 


1.31 


( .29) 


( 1-41) 


-1 .28 


( .17) 


( 3.95) 


-.35 


( .40) 


( 8.70) 


1.05 


( .17) 


(13.80) 


-.11 


( • 17") 


(32.42) 


.29 


(.35) 


(51.34) 


1.56 


( .29) 


( -24) 


-1.23 


( .21) 


( .31) 


-.20 


(.39) 


( .68) 


.99 


( .15) 


( .81) 


-.08 


( .14) 


( 1-76) 


-.15 


( .33) 


( 2.08) 


1.09 


( .25) 


( .63) 


-1.11 


( .23) 


( 1-64) 


-.75 


(.50) 



* C8 and C9 are based on 18 and 19 replications, respectively, 
due to a convergence problem in estimating item parameters using 
NOHARM. 



Linking 
23 



Table 4. Root Mean Square Error Between True and Estimated Item 
Parameters after Linking (20 replications in Each Condition) 



Poor Good No Linking 









a 2 


& 




a 

2 


a 


3 ! 


a 2 


a 


Cl 


M 


4.04 


3.32 


5.41 


.56 


.41 


.29 


.59 


.43 


./3 0 




SD 


1 . U / 


o . J 1 


1U . z / 


• 1U 


. 1 J 


• U9 


. 16 


1 A 

. 14 


. 10 


C2 


M 


1 . / / 


1 . Id 


1 Q O 

l.oz 


R R 
• J J 


. J / 


• 4 


. J 3 


. 27 


. 2 3 




SD 


2.04 


1.53 


2.60 


.13 


. 11 


.08 


.10 


.10 


.10 


C3* 


M 


3.83 


1.28 


2.89 


.61 


.47 


.39 


.88 


.57 


.40 




SD 


2.61 


1. 26 


3.23 


. 12 


. 14 


.14 


.24 


.20 


.15 


C4 


M 


2.21 


1.88 


2. 11 


.59 


.47 


. 31 


.65 


.52. 


1.45 




oU 


£ . /y 


j . y i 


t . oU 


1 1 
• 1 J 


• 1 J 


• 1U 




. lb 


• Z J 


C5 


M 


2 . 06 


2.08 


2.78 


. 54 


A C 

. 45 


. 27 


. 35 


. 3 2 


1.22 










*■* • ^ 


.12 


. 07 


.06 


. 08 


. 06 


. 09 


C6 


M 


10.65 


3 . 12 


6 . 90 


.66 


.53 


.40 


.95 


.68 


1.67 




SD 


32.94 


4.98 


15.57 


. 13 


. 12 


.12 


.30 


.24 


.33 


C7 


M 


3.67 


2.96 


4.49 


.58 


.47 


. 30 


.48 


.40 


2.37 




SD 


6 . 18 


6.86 


8.32 


. 12 


.09 


.05 


.15 


.07 


.21 


C8* 


M 


5.78 


8.51 


8.78 


.58 


.49 


. 30 


.31 


.36 


2.19 




SD 


21.16 


32.83 


34.28 


. 13 


.08 


.03 


.09 


.07 


. 14 


C9* 


M 


1.45 


1.22 


1.39 


.64 


.49 


. 36 


.83 


.51 


2.75 




SD 


.87 


1.12 


1.36 


.12 


.13 


.09 


.23 


.18 


.36 


* C3 


and 


C9 are 


based 


on 19 


replications. C8 


is based 


on 


18 



replications. See Table 2 for explanations. 



